A New Approach for Automatic Thesaurus Construction and Query Expansion for Document Retrieval
نویسندگان
چکیده
In this paper, we present a new approach for automatic thesaurus construction and query expansion for document retrieval. We analyze the information between any two terms in each document cluster center of final document clusters or intermediate document clusters in the clustering process to automatically construct the thesaurus, where these information includes the co-occurrence frequency of any two terms in each document cluster center, the degree of effect of each term in each document cluster center and the inner noise of each document cluster, respectively. We also present a query expansion method to expand the user’s queries and present a new method to calculate the degree of similarity between the user’s query and documents. The proposed thesaurus construction method and the proposed query expansion method can improve the performance of information retrieval systems for dealing with document retrieval.
منابع مشابه
Improving Retrieval by a Similarity Thesaurus based on Hyperlink Structure
One strategy to enhance the retrieval effectiveness of search engines is to apply automatic query expansion. For this purpose a similarity thesaurus may be applied in order to find new search terms. The similarity thesaurus may be constructed using a model for term comparison. Common methods to define term distances are based on the occurrence frequencies of terms in documents. In this article ...
متن کاملQEA: A New Systematic and Comprehensive Classification of Query Expansion Approaches
A major problem in information retrieval is the difficulty to define the information needs of user and on the other hand, when user offers your query there is a vast amount of information to retrieval. Different methods , therefore, have been suggested for query expansion which concerned with reconfiguring of query by increasing efficiency and improving the criterion accuracy in the information...
متن کاملSemi-Automatic Indexing of Multilingual Documents
With the growing significance of digital libraries and the Internet, more and more electronic texts become accessible to a wide and geographically disperse public. This requires adequate tools to facilitate indexing, storage, and retrieval of documents written in different languages. We present a method for semi-automatic indexing of electronic documents and construction of a multilingual thesa...
متن کاملAutomatische Thesauruserstellung und Query Expansion in einer E-Commerce-Anwendung
This work describes a method for the automatic construction of a thesaurus based on existing categories of documents. A clustering algorithm, “the layer seeds method”, is introduced, which facilitates the automatic generation of a thesaurus reflecting the specific vocabulary occurring in a given collection of documents. We assume that the collection is partitioned into document categories. The ...
متن کاملQuery Architecture Expansion in Web Using Fuzzy Multi Domain Ontology
Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007